Skip to content

fix(addie): owner evaluate_agent_quality writes to canonical compliance state#4250

Merged
bokelley merged 6 commits intomainfrom
claude/issue-4247-owner-test-canonical-write
May 11, 2026
Merged

fix(addie): owner evaluate_agent_quality writes to canonical compliance state#4250
bokelley merged 6 commits intomainfrom
claude/issue-4247-owner-test-canonical-write

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

@bokelley bokelley commented May 8, 2026

Refs #4247 — PR 1 of 4 in the compliance-state unification initiative.

Summary

Closes the 12-hour gap between an owner running evaluate_agent_quality (via Addie chat or the test_adcp_agent MCP tool) and the public /api/registry/agents/:url/compliance endpoint reflecting the result.

Root cause: evaluate_agent_quality wrote to agent_test_history / agent_contexts.last_test_* (not visible to the compliance API). The heartbeat is the only prior writer to the canonical tables, so owner-triggered results were invisible until the next 12-hour cron.

What this PR does (owner path only):

  1. Checks whether the calling org owns the agent via member_profiles.agents @> $1::jsonb JOIN organization_memberships — identical to resolveAgentOwnerOrg in registry-api.ts:4733.
  2. If owner AND compliance_opt_out is not set: calls complianceResultToDbInput() + complianceDb.recordComplianceRun() with triggered_by = 'owner_test' and dry_run = false.
  3. Adds 'owner_test' to the TriggeredBy type and both triggered_by CHECK constraints (migration 471 — covers agent_compliance_runs and agent_storyboard_status).
  4. Explicitly omits notifyComplianceChange to prevent Slack spam on iterative owner test runs.
  5. Retains the legacy agentContextDb.recordTest() write for backward compatibility (deprecated in PR 3).

Known gap documented for follow-up: AAO Verified badge state is still updated only by the heartbeat (badge issuance calls processAgentBadges which is in the heartbeat loop, not in recordComplianceRun). Compliance status reflects owner test results immediately; badge re-issuance still requires the next heartbeat. Tracked in #4247.

Non-breaking justification: Adds an optional write path to existing canonical tables with a new triggered_by = 'owner_test' value. No fields removed, no types changed, no existing writes modified. The new enum value is additive; the constraint drop-and-recreate is non-destructive DDL. Existing consumers of the compliance API gain data, not schema changes.

Pre-PR review

  • code-reviewer: approved after fixes — ownership query strengthened to join organization_memberships, compliance_opt_out guard added, dry_run: false comment added, log level on owner-check failure raised to warn
  • internal-tools-strategist: approved — ownership check (member_profiles.agents @> $1::jsonb) matches established codebase pattern; notifyComplianceChange suppression is correct per notification subscriber scope (Slack + external consumers via Scope3); badge-issuance gap acknowledged as follow-up

Triage-managed PR. This bot does not currently iterate on
review comments or PR conversation threads (only on the source
issue). To unblock:

  • Push fixup commits directly: gh pr checkout <num>
    fix → push.
  • Or re-trigger: comment /triage execute on the source
    issue.

See #3121
for context.

Session: https://claude.ai/code/session_01UNHkGhBXk9XD2dpzvSLdhb


Generated by Claude Code

@bokelley bokelley added the claude-triaged Issue has been triaged by the Claude Code triage routine. Remove to re-triage. label May 8, 2026
@EmmaLouise2018
Copy link
Copy Markdown
Contributor

Reviewing this against the updated #4247 plan. Right shape for PR 1 — owner-only canonical writes + triggered_by = 'owner_test' + compliance_opt_out guard + Slack-noise suppression all match the design. Two asks before merge to keep the public contract honest:

1. verdict_source field on the compliance response (load-bearing)

/api/registry/agents/:url/compliance is a public registry surface. What it reflects is changing, even though the field names aren't:

  • Pre-PR: "this agent's last scheduled verdict" (heartbeat-only).
  • Post-PR: "this agent's last verdict from any source" (heartbeat + owner_test).

A consumer scraping this endpoint daily learned heartbeat history; post-PR they'll see any verdict, possibly an owner running it on demand to test a fix. Tell the caller, don't guess silently. Add verdict_source: 'heartbeat' | 'owner_test' to the response so downstream filters correctly. Reads off the triggered_by of the latest agent_compliance_runs row joined to agent_compliance_status — same source that's already populated in this PR.

The OperatorLookupResultSchema analog (AgentComplianceDetailSchema or whichever is canonical for /compliance) gets the additive optional field. Frozen reporting contract argument applies; don't change semantics on the wire without a flag.

Alternative: a SHOULD-clause in the changeset spelling out the semantic shift for downstream scrapers. Pick one. Don't ship neither.

2. Last-write-wins race test (pin the contract)

Heartbeat and owner_test can fire within minutes. Two concurrent writers, one row in agent_compliance_status. The implicit rule under the current code is last-write-wins on (agent_url). State it. A test pins the rule:

// heartbeat-style write at T → status reflects T
// owner_test write at T+1s → status reflects T+1s
// owner_test at T+2s, heartbeat at T+3s → status reflects T+3s

A future refactor that switches to "first-write-wins" or "merge" would silently change the public contract. The test is the contract.

Lineage

Approving on the strength of the owner-only canonical write pattern plus compliance_opt_out guard. Two asks above before merge — both small. The badge-issuance gap acknowledged in the PR description is the right deferral; that's an issue against #4247 to track.

bokelley pushed a commit that referenced this pull request May 8, 2026
…ns test

Address review feedback from @EmmaLouise2018 on PR #4250:

1. `verdict_source` field on /api/registry/agents/:url/compliance
   — `AgentComplianceDetailSchema` gains optional `verdict_source`:
     'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null
   — `getComplianceStatus` and `bulkGetComplianceStatus` join
     `agent_compliance_runs` via LATERAL subquery (dry_run=false,
     ORDER BY tested_at DESC LIMIT 1) to surface the triggered_by
     of the most recent run.  No migration needed.
   — Endpoint response emits `verdict_source: status.last_triggered_by`.
   — `AgentComplianceStatus` interface gets `last_triggered_by` field.

2. Last-write-wins contract test
   — New `compliance-db-last-write-wins.test.ts` pins the ON CONFLICT
     DO UPDATE semantics: every recordComplianceRun call overwrites
     agent_compliance_status regardless of triggered_by source.  A
     future change to first-write-wins or priority ordering would
     break these tests.

https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ
@bokelley
Copy link
Copy Markdown
Contributor Author

bokelley commented May 8, 2026

Both asks addressed in the fixup commit (0c996a6):

1. verdict_source on /api/registry/agents/:url/compliance

AgentComplianceDetailSchema gets a new optional field:

verdict_source: 'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null

getComplianceStatus and bulkGetComplianceStatus now include a LATERAL subquery against agent_compliance_runs (dry_run = false, ORDER BY tested_at DESC LIMIT 1) to surface triggered_by as last_triggered_by on AgentComplianceStatus. The endpoint emits it as verdict_source. No migration needed — the data is already there.

2. Last-write-wins contract test

New compliance-db-last-write-wins.test.ts (4 cases) pins:

  • Every recordComplianceRun call issues ON CONFLICT (agent_url) DO UPDATE — never DO NOTHING.
  • triggered_by is forwarded verbatim for both heartbeat and owner_test sources.
  • getComplianceStatus returns last_triggered_by from the most recent non-dry run (asserts dry_run = false and ORDER BY tested_at DESC LIMIT 1 are in the SQL).

A future switch to first-write-wins or source-priority filtering breaks these tests.

All 3302 unit tests pass.

Session: https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ


Generated by Claude Code

@bokelley
Copy link
Copy Markdown
Contributor Author

bokelley commented May 9, 2026

Code review (expert pass): solid root, ready to merge after stripping committed dist/ files.

Block:

  • dist/schemas/onboarding-openapi.{js,d.ts,js.map,d.ts.map} are checked in. Generated artifacts shouldn't be in the diff. Remove and amend.

Nits (non-blocking):

After dist strip, mark ready and the chain unblocks (#4263 is clean too — see #4264 for the real chain blocker).

@bokelley
Copy link
Copy Markdown
Contributor Author

bokelley commented May 9, 2026

Blocker addressed — pushed c71809a0.

  • Removed dist/schemas/onboarding-openapi.{js,d.ts,js.map,d.ts.map} via git rm.
  • Added .gitignore rules for dist/schemas/*.{js,js.map,d.ts,d.ts.map} so build artifacts in that directory can't sneak back in.

Nits noted (not fixed):

Ready to mark out of draft when you are.


Triaged by Claude Code. Session: https://claude.ai/code/session_01WrFMahjaHU7y4JWqPqxTbx


Generated by Claude Code

EmmaLouise2018 pushed a commit that referenced this pull request May 9, 2026
…ns test

Address review feedback from @EmmaLouise2018 on PR #4250:

1. `verdict_source` field on /api/registry/agents/:url/compliance
   — `AgentComplianceDetailSchema` gains optional `verdict_source`:
     'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null
   — `getComplianceStatus` and `bulkGetComplianceStatus` join
     `agent_compliance_runs` via LATERAL subquery (dry_run=false,
     ORDER BY tested_at DESC LIMIT 1) to surface the triggered_by
     of the most recent run.  No migration needed.
   — Endpoint response emits `verdict_source: status.last_triggered_by`.
   — `AgentComplianceStatus` interface gets `last_triggered_by` field.

2. Last-write-wins contract test
   — New `compliance-db-last-write-wins.test.ts` pins the ON CONFLICT
     DO UPDATE semantics: every recordComplianceRun call overwrites
     agent_compliance_status regardless of triggered_by source.  A
     future change to first-write-wins or priority ordering would
     break these tests.

https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ
@EmmaLouise2018 EmmaLouise2018 force-pushed the claude/issue-4247-owner-test-canonical-write branch from c71809a to 42e7f37 Compare May 9, 2026 00:09
EmmaLouise2018 added a commit that referenced this pull request May 9, 2026
PR 2 of the #4247 unification stack. Reads two fields PR #4250 added
to the compliance API but the dashboard wasn't yet rendering:

- compliance tile: appends "(your test)" / "(heartbeat)" / "(manual)"
  / "(webhook)" after Last checked, so operators see whether the
  current verdict came from their own evaluate_agent_quality run or
  the scheduled heartbeat.
- history panel: per-run badge with the same source label, info-blue
  for owner_test and neutral for the rest. Pre-PR-1 rows render with
  neutral — no regression.

No backend changes; pure UI surfacing of fields already in the API.
Stacked on PR #4250.
bokelley added a commit that referenced this pull request May 11, 2026
…o keys (#4364)

* fix(compliance): rewrite deriveStoryboardStatuses for SDK 6.x scenario keys

The compliance heartbeat has been writing zero rows to
agent_storyboard_status since the SDK switched comply() to storyboard-
driven testing. The SDK emits one TestResult per phase of each storyboard,
keyed `<storyboard_id>/<phase_id>` in result.tracks[].scenarios[].scenario
(see @adcp/sdk compliance/storyboard-tracks.ts). The old implementation
walked the YAML's per-step `comply_scenario` field (bare names like
`signals_flow`, `capability_discovery`) and looked them up in the SDK's
scenario map. Every lookup missed → testedCount === 0 → every storyboard
skipped at the `continue` guard.

Effect across the registry:
  agent_storyboard_status total rows: 6  (across 4 agents)
  rows written by triggered_by='heartbeat': 0
  rows surviving were legacy bare-name keys from old manual runs

This silently broke the AAO Verified badge pipeline (no storyboard rows
→ deriveVerificationStatus has nothing to verify against) and every
agent's dashboard `storyboards_passing: 0 / N` was misleading: the
runner wasn't failing storyboards, the parser was dropping them.

Surfaced by escalation #329: Evgeny's agent was running 30/30 scenarios
clean but showing `degraded` because specialism_status.signal-owned read
'untested' from a never-populated agent_storyboard_status row.

Fix: read SDK output directly. Group scenarios by storyboard id, roll
per-step pass counts up from each phase's `steps` array, fall back to
phase-level counts when steps are absent. The `storyboardIds` override
is preserved for explicit-IDs callers that need an `untested` entry
when the runner didn't run a requested storyboard. The unused YAML
`comply_scenario` field is no longer load-bearing for status mapping
(the SDK already knows which storyboards it ran).

Tests: 9 cases covering all-pass, partial, all-fail, phase-only fallback,
legacy bare-name skip, empty input, and explicit-IDs untested gap.

Stack note: this is orthogonal to Emma's #4247 compliance-state
unification stack (#4250, #4263, #4264, #4268, #4274) which collapses
agent_test_history into agent_compliance_runs. Different files; rebases
cleanly in either order.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(scripts): test-comply-storyboard-statuses — local harness for the fix

Runs comply() against an agent URL and prints what
deriveStoryboardStatuses would produce, without DB writes. Used to
validate the SDK-6.x scenario-key fix against real agents
(adcp-signals-adaptor.evgeny-193.workers.dev/mcp and
wonderstruck.sales-agent.scope3.com/mcp) before merging.

Will stay useful for future SDK upgrades that touch scenario emission
or storyboard-track aggregation — same pattern as the
diagnose-agent-comply-queue script from #4361.

Usage:
  npx tsx server/src/scripts/test-comply-storyboard-statuses.ts <agent-url> [<agent-url> ...]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(compliance): code review nits — clarify steps doc, hoist explicit-ids check, add 3 edge tests

Addresses code-reviewer feedback on PR #4364:
- JSDoc on deriveStoryboardStatuses now calls out that steps_passed/total
  are not directly comparable across rows (some rows are real step counts,
  some are phase-level fallbacks when the SDK omits per-step data).
- Comment pinning the storyboard-id invariant (flat ids, no `/`) so the
  indexOf split stays correct as new storyboards land.
- Defensive `result.tracks ?? []` so a malformed result doesn't throw.
- Hoist `storyboardIds && length > 0` into a single `hasExplicitIds`
  const used at both the toEmit decision and the no-data fallback.
- Three new test cases:
  * same storyboard split across multiple tracks aggregates correctly
  * result.tracks absent → []
  * non-string scenario values (null, number) → skipped without throwing

12/12 vitest passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley
Copy link
Copy Markdown
Contributor Author

Heads up — PR #4364 just merged to main, which fixes deriveStoryboardStatuses to read SDK 6.x's <storyboard_id>/<phase_id> scenario keys. Pre-#4364, every call to complianceResultToDbInput → deriveStoryboardStatuses produced an empty storyboard_statuses array (so agent_storyboard_status had zero heartbeat-triggered rows registry-wide).

Without #4364 underneath, this PR's owner-test write path would have inherited the same bug — owner runs from Addie would have written canonical agent_compliance_runs rows but with empty storyboard arrays, so the public registry would still show storyboards_passing: 0/N for healthy agents tested by their owner.

With #4364 in main: owner-test writes will produce real per-storyboard data on first run after this PR ships. Worth a rebase before merge so CI exercises both fixes together.

Linking #4364 here for the chain: #4364#4250#4263#4264#4268#4274.

claude and others added 5 commits May 11, 2026 05:07
…ce state

Closes the 12-hour gap between owner-triggered storyboard runs and the public
/api/registry/agents/:url/compliance endpoint (issue #4247, PR 1 of 4).

When evaluate_agent_quality is triggered by the agent owner, the result is now
written to agent_compliance_status + agent_compliance_runs + agent_storyboard_status
with triggered_by = 'owner_test'. Non-owner runs continue writing to agent_test_history
(deprecated in PR 3). Migration 471 adds 'owner_test' to both triggered_by CHECK
constraints. notifyComplianceChange is intentionally suppressed for owner runs to
prevent iteration-loop Slack spam.

https://claude.ai/code/session_01UNHkGhBXk9XD2dpzvSLdhb
…ns test

Address review feedback from @EmmaLouise2018 on PR #4250:

1. `verdict_source` field on /api/registry/agents/:url/compliance
   — `AgentComplianceDetailSchema` gains optional `verdict_source`:
     'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null
   — `getComplianceStatus` and `bulkGetComplianceStatus` join
     `agent_compliance_runs` via LATERAL subquery (dry_run=false,
     ORDER BY tested_at DESC LIMIT 1) to surface the triggered_by
     of the most recent run.  No migration needed.
   — Endpoint response emits `verdict_source: status.last_triggered_by`.
   — `AgentComplianceStatus` interface gets `last_triggered_by` field.

2. Last-write-wins contract test
   — New `compliance-db-last-write-wins.test.ts` pins the ON CONFLICT
     DO UPDATE semantics: every recordComplianceRun call overwrites
     agent_compliance_status regardless of triggered_by source.  A
     future change to first-write-wins or priority ordering would
     break these tests.

https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ
…rtifacts

Generated JS/TS files don't belong in source control. Also adds
.gitignore rules for dist/schemas/*.{js,d.ts,*.map} to prevent recurrence.

https://claude.ai/code/session_01WrFMahjaHU7y4JWqPqxTbx
@bokelley bokelley force-pushed the claude/issue-4247-owner-test-canonical-write branch from 54a4f16 to f7f933b Compare May 11, 2026 09:10
Six changes responding to expert review:

1. Renumber migration 472 → 475 (collision with 472_drop_member_profiles_
   primary_brand_domain.sql on main since this PR was opened).

2. Drop `verdict_source` from the public /api/registry/agents/:url/compliance
   response. Heartbeat and owner_test both call comply() against the same
   registered URL with the same owner-saved credentials; the verdict's
   truth content is identical regardless of who pulled the trigger.
   Surfacing the source label to buyers creates a trust distinction the
   underlying observation doesn't carry. `triggered_by` stays as internal
   audit on agent_compliance_runs; Emma's #4263 dashboard surface keeps
   the operator UX cue.

3. Drop the legacy `recordComplianceRun(... 'manual')` write inside
   evaluate_agent_quality. Pre-PR, that write was invisible publicly.
   Post-PR (with verdict_source on the response), it would have leaked
   `verdict_source: 'manual'` for non-owner runs — security-reviewer's B1.
   Even with verdict_source removed (item 2), the legacy write was
   gated only on `agent_contexts` row existence — which `save_agent`
   lets any org create for any URL without an ownership check — so a
   non-owner could publish a verdict on someone else's agent.
   The owner-test branch covers the dashboard-freshness use case with a
   real ownership check; the legacy public-state write has no remaining
   function. Legacy agent_test_history write retained as session-scoped
   audit until Emma's PR 3 cleans up.

4. Dashboard "Run this storyboard" endpoint now writes `triggered_by =
   'owner_test'` instead of `'manual'` (owner-only path, consistent with
   the new semantic). Legacy `'manual'` value remains in the enum for
   historical rows.

5. Rate limit on evaluate_agent_quality via existing Addie tool rate
   limiter (default 60/10min). comply() itself takes 10-60s per run so
   the natural ceiling is already ~1-2/min — this adds a hard wall above
   that. Security-reviewer's B2.

6. Updated changeset content + migration number references; replaces
   stale "migration 471" prose from earlier renumber.

Out of scope (deferred follow-ups):
- Badge issuance on owner_test transitions: owner currently waits up to 1h
  for next heartbeat to re-issue badge. Skipped because the badge fan-out
  in compliance-heartbeat.ts is ~70 lines and extracting/sharing is its
  own refactor.
- Extracting resolveAgentOwnerOrg to a shared helper: same drift-surface
  call applies, kept inline to bound this PR's scope.
- Active-membership filter: organization_memberships has no status
  column (row deletion = removal), so the security-review M2 concern
  doesn't apply at the schema layer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley
Copy link
Copy Markdown
Contributor Author

Review iteration — addressed security + correctness feedback

Pushed 828ea1d with the following changes:

Blockers

  • ✅ Migration renumbered 472 → 475 (collision with 472_drop_member_profiles_primary_brand_domain.sql on main)
  • ✅ Changeset updated (was referencing stale "migration 471")
  • ✅ Security B1 — Dropped the legacy recordComplianceRun(... 'manual') write inside evaluate_agent_quality. That path was gated only on agent_contexts row existence (which save_agent lets any org create for any URL with no ownership check), so a non-owner could publish a verdict_source: 'manual' on someone else's agent.
  • ✅ Security B2 — Rate limit on evaluate_agent_quality via existing Addie tool rate limiter (default 60/10min). comply() itself takes 10-60s per run.

Trust-model fix

  • Dropped verdict_source from the public /api/registry/agents/:url/compliance response entirely. Discussion with Brian: heartbeat and owner_test both call comply() against the same registered URL with the same owner-saved credentials; the verdict's truth content is identical regardless of who pulled the trigger. Exposing the source label to buyers creates a trust distinction the underlying observation doesn't actually carry. triggered_by stays as internal audit on agent_compliance_runs; Emma's feat(dashboard): surface verdict_source + per-run triggered_by badge #4263 dashboard surface keeps the operator UX cue.

Consistency fix

  • ✅ Dashboard "Run this storyboard" endpoint now writes triggered_by = 'owner_test' instead of 'manual' (owner-only path; consistent with the new semantic).

Out of scope — follow-ups, not part of this PR

  • Badge issuance on owner_test transitions: owner currently waits up to 1h for next heartbeat to re-issue badge. The badge fan-out in compliance-heartbeat.ts is ~70 lines and sharing it requires extracting to a helper — its own refactor.
  • Extracting resolveAgentOwnerOrg to a shared helper: real drift surface, but keeping inline to bound this PR's scope.

N/A

  • Active-membership filter: organization_memberships has no status column (row deletion = removal), so security-review M2 doesn't apply at the schema layer.

Ready for re-review.

@bokelley
Copy link
Copy Markdown
Contributor Author

Re-review (post-828ea1d)

Two experts (code-reviewer + security-reviewer) reviewed the updated diff. No ship-blockers.

Confirmed addressed:

  • B1 (unguarded 'manual' write): member_profiles.agents @> $2::jsonb JOIN organization_memberships closes the hole — an org with only an agent_contexts row for a URL they don't own will fail the ownership query. compliance_opt_out guard is present on the canonical-write path.
  • B2 (rate limit): checkToolRateLimit at the handler top is correct; the 10–60 s/run natural ceiling means the hard wall is belt-and-suspenders.
  • Trust model: last_triggered_by is NOT serialized in the public /api/registry/agents/:url/compliance route handler (confirmed — that handler enumerates fields explicitly). Decision and rationale well-documented in the comment block.
  • CODE_VERSION: bumped to 2026.05.1
  • Both getComplianceStatus and the bulk variant updated with the LATERAL join ✓

Nits (non-blocking, your call):

  1. Migration 474 gap. main tops at 473_users_primary_organization_id_fk.sql; the PR uses 475_, permanently skipping 474_. If this was intentional — avoiding a known in-flight PR — no action needed. Otherwise, rename to 474_.

  2. workosUserId shadow. The const workosUserId = ... inside if (organizationId) re-declares the same value already in scope from the rate-limit check above. Delete the inner declaration and use the outer one.

  3. LATERAL missing partial index. The existing (agent_url, tested_at DESC) index doesn't filter on dry_run; Postgres will scan all rows for the agent before satisfying LIMIT 1 once dry-run rows accumulate. Adding to migration 475:

    CREATE INDEX CONCURRENTLY idx_compliance_runs_nondry
      ON agent_compliance_runs(agent_url, tested_at DESC)
      WHERE dry_run = false;

    backs the LATERAL efficiently without a table lock.

  4. owner_test visible in run history. The public /api/registry/agents/:url/compliance/history endpoint emits triggered_by per run, so external callers can see when an owner self-tested. Low-signal disclosure (reveals "owner ran a test," not anything beyond the already-public verdict), but worth confirming the behavior is intended.


Triaged by Claude Code. Session: https://claude.ai/code/session_014mrjryUmFVVrB2BzDhgMZR


Generated by Claude Code

@bokelley bokelley marked this pull request as ready for review May 11, 2026 09:45
@bokelley bokelley merged commit 45f3371 into main May 11, 2026
15 checks passed
@bokelley bokelley deleted the claude/issue-4247-owner-test-canonical-write branch May 11, 2026 09:45
bokelley pushed a commit that referenced this pull request May 11, 2026
PR 2 of the #4247 unification stack. Reads two fields PR #4250 added
to the compliance API but the dashboard wasn't yet rendering:

- compliance tile: appends "(your test)" / "(heartbeat)" / "(manual)"
  / "(webhook)" after Last checked, so operators see whether the
  current verdict came from their own evaluate_agent_quality run or
  the scheduled heartbeat.
- history panel: per-run badge with the same source label, info-blue
  for owner_test and neutral for the rest. Pre-PR-1 rows render with
  neutral — no regression.

No backend changes; pure UI surfacing of fields already in the API.
Stacked on PR #4250.
bokelley pushed a commit that referenced this pull request May 11, 2026
PR 2 of the #4247 unification stack. Reads two fields PR #4250 added
to the compliance API but the dashboard wasn't yet rendering:

- compliance tile: appends "(your test)" / "(heartbeat)" / "(manual)"
  / "(webhook)" after Last checked, so operators see whether the
  current verdict came from their own evaluate_agent_quality run or
  the scheduled heartbeat.
- history panel: per-run badge with the same source label, info-blue
  for owner_test and neutral for the rest. Pre-PR-1 rows render with
  neutral — no regression.

No backend changes; pure UI surfacing of fields already in the API.
Stacked on PR #4250.
bokelley pushed a commit that referenced this pull request May 11, 2026
…4263)

PR 2 of the #4247 unification stack. Reads two fields PR #4250 added
to the compliance API but the dashboard wasn't yet rendering:

- compliance tile: appends "(your test)" / "(heartbeat)" / "(manual)"
  / "(webhook)" after Last checked, so operators see whether the
  current verdict came from their own evaluate_agent_quality run or
  the scheduled heartbeat.
- history panel: per-run badge with the same source label, info-blue
  for owner_test and neutral for the rest. Pre-PR-1 rows render with
  neutral — no regression.

No backend changes; pure UI surfacing of fields already in the API.
Stacked on PR #4250.
bokelley added a commit that referenced this pull request May 11, 2026
closes #4377) (#4388)

PR #4250's review flagged the agent-ownership JOIN as a drift surface
— duplicated inline in two places with subtly different semantics:

- registry-api.ts:4825 `resolveAgentOwnerOrg(userId, agentUrl)` —
  "any org the user is a member of that owns the agent"
- member-tools.ts:3588 (inline) — "the resolved member-context org IS
  the owning org" (tighter predicate that adds the org_id constraint)

Both queries join `member_profiles.agents` against `organization_
memberships` — same canonical relation, different selectivity.

This PR extracts both to `server/src/services/agent-ownership.ts`:

- `findOwnerOrgForUser(userId, agentUrl): Promise<string | null>` —
  registry-api.ts's "discover ownership" use case
- `isOrgOwnerOfAgent(orgId, userId, agentUrl): Promise<boolean>` —
  member-tools.ts's "confirm specific org" use case

Both call sites updated to use the helpers. `resolveAgentOwnerOrg` is
retained as a closure-scoped alias inside the registry-api factory so
existing call sites don't need to thread the import.

Note on active-membership filtering: `organization_memberships` has
no status column in this schema — removed members get their row
deleted, not status-flipped. Row existence is the membership signal.
Documented in the helper file's module doc.

Tests (7 passing):
- findOwnerOrgForUser returns org_id, null on no match, null on throw
- isOrgOwnerOfAgent returns true/false correctly
- semantic distinction pinned: findOwnerOrgForUser uses 2 params (no
  org filter); isOrgOwnerOfAgent uses 3 params (org_id constraint)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-triaged Issue has been triaged by the Claude Code triage routine. Remove to re-triage.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants